Model Selection

Efficient Quantization

# Efficient Quantization

Deepseek Ai DeepSeek R1 Distill Qwen 14B GGUF

DeepSeek-R1-Distill-Qwen-14B is an optimized large language model with a parameter scale of 14B, released by DeepSeek AI. It is distilled from the Qwen architecture and offers multiple GGUF quantization versions to improve performance.

Large Language Model

featherless-ai-quants

Medra27b I1 GGUF

A quantized version of Medra27B, offering multiple quantization types, suitable for multiple fields such as text generation and medical artificial intelligence.

Large Language Model

Transformers Supports Multiple Languages

Nvidia Llama 3.1 Nemotron Nano 4B V1.1 GGUF

A quantized version of the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model, processed with llama.cpp tools for various quantization methods, suitable for running in resource-constrained environments.

Large Language Model English

Seed Coder 8B Instruct GGUF

This model has undergone self-quantization processing, with output and embedding tensors quantized to f16 format, and the remaining tensors quantized to q5_k or q6_k format, resulting in a smaller size while maintaining performance comparable to pure f16.

Large Language Model English

Andrewzh Absolute Zero Reasoner Coder 7b GGUF

Llamacpp quantized version based on andrewzh's Absolute_Zero_Reasoner-Coder-7b model, supporting multiple quantization levels, suitable for reasoning and code generation tasks.

Large Language Model

Qwen3-14B-AWQ is the latest 4-bit AWQ quantized version of the Qwen series large language model, supporting seamless switching between reasoning and non-reasoning modes with powerful inference, instruction following, and agent capabilities.

Large Language Model

Mlabonne Qwen3 4B Abliterated GGUF

Quantized version of Qwen3-4B-abliterated, quantized using llama.cpp, supports multiple quantization types, suitable for text generation tasks.

Large Language Model

Qwen Qwen3 1.7B GGUF

A quantized version based on Qwen/Qwen3-1.7B, using llama.cpp for quantization, supporting multiple quantization types, suitable for text generation tasks.

Large Language Model

Dreamgen Lucid V1 Nemo GGUF

A quantized model based on dreamgen/lucid-v1-nemo, processed with llama.cpp for various quantization levels, suitable for text generation tasks.

Large Language Model English

Gemma 3 4b It Abliterated GGUF

This model is a GGUF format version converted from mlabonne/gemma-3-4b-it-abliterated, suitable for local operation and inference.

Large Language Model

Gemma 3 12b It GGUF

Gemma 3 12B is a large language model that provides a quantized version in GGUF format, suitable for local deployment and use.

Large Language Model

EXAONE Deep 2.4B AWQ

The EXAONE Deep series models excel in reasoning tasks such as mathematics and programming. This model is the 4-bit AWQ quantized version with 2.4 billion parameters

Large Language Model

Transformers Supports Multiple Languages

Thedrummer Gemmasutra Small 4B V1 GGUF

Gemmasutra-Small-4B-v1 is a 4B-parameter text generation model, quantized based on llama.cpp, suitable for various quantization version choices.

Large Language Model

Internvl2 5 4B AWQ

InternVL2_5-4B-AWQ is the AWQ quantized version of InternVL2_5-4B using autoawq, supporting multilingual and multimodal tasks.

Transformers Other

Ozone Ai 0x Lite GGUF

Quantized version based on ozone-ai/0x-lite model, supporting Chinese and English text generation tasks, using llama.cpp for imatrix quantization, offering multiple quantization options to adapt to different hardware requirements.

Large Language Model Supports Multiple Languages

Thedrummer Gemmasutra 9B V1.1 GGUF

This is a quantized version based on TheDrummer/Gemmasutra-9B-v1.1 model, processed using llama.cpp, suitable for text generation tasks.

Large Language Model

Mt0 Xxl Mt Q4 K M GGUF

This model is a multilingual text generation model converted from bigscience/mt0-xxl-mt to GGUF format via llama.cpp, supporting various language tasks.

Large Language Model Supports Multiple Languages

Summllama3.1 8B GGUF

An 8B-parameter summary generation model optimized based on Llama3 architecture, offering multiple quantization versions

Large Language Model

FLUX.1 Schnell GGUF

FLUX.1-schnell is an efficient text-to-image generation model based on a diffusion model architecture, supporting English text input to generate high-quality images.

Text-to-Image English

Phi 3.5 Mini Instruct Uncensored GGUF

Phi-3.5-mini-instruct_Uncensored is a quantized language model suitable for use under various hardware conditions.

Large Language Model

FLUX.1 Schnell Quantized

Quantized version of FLUX.1-schnell, a text-to-image diffusion model supporting multiple quantization precision options

Text-to-Image English

This model is a sentence similarity model converted from BAAI/bge-m3 to GGUF format using llama.cpp via ggml.ai's GGUF-my-repo space.

Chronos T5 Tiny

Chronos is a family of pretrained time series forecasting models based on language model architectures, trained by quantizing and scaling time series into token sequences.

Chronos T5 Base

Chronos is a family of pre-trained time series forecasting models based on language model architecture, which transforms time series into token sequences for training to achieve probabilistic forecasting.

Llava V1.6 34B Gguf

LLaVA 1.6 34B is an open-source multimodal chatbot model developed by fine-tuning a large language model on multimodal instruction-following data. It supports image-to-text and text-to-text generation tasks.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase